Crosstable is a package centered on a single function, crosstable, which easily computes descriptive statistics on datasets.
It can use the tidyselect syntax for selecting variables (and more) and is interfaced with the package officer to create automatized reports.
install.packages("devtools")
devtools::install_github("DanChaltiel/crosstable", build_vignettes=TRUE)
In case of any installation problem, try reading the wiki or fill an Issue.
You can use the vignettes (click on the links):
vignette("crosstable") for global use and parameterizationvignette("crosstable-selection") for variable selectionvignette("crosstable-report") for reporting with officer or RmarkdownWithout any more argument than the dataset, the function will summarise all numeric variables with statistics (min+max, mean+sd, median+IQR, N+NA) and all categorical variables with counts and percentages.
library(crosstable)
library(dplyr) #for the pipe
crosstable(iris)
#> .id label variable value
#> 1 Sepal.Length Sepal.Length Min / Max 4.3 / 7.9
#> 2 Sepal.Length Sepal.Length Med [IQR] 5.8 [5.1;6.4]
#> 3 Sepal.Length Sepal.Length Mean (std) 5.8 (0.8)
#> 4 Sepal.Length Sepal.Length N (NA) 150 (0)
#> 5 Sepal.Width Sepal.Width Min / Max 2.0 / 4.4
#> 6 Sepal.Width Sepal.Width Med [IQR] 3.0 [2.8;3.3]
#> 7 Sepal.Width Sepal.Width Mean (std) 3.1 (0.4)
#> 8 Sepal.Width Sepal.Width N (NA) 150 (0)
#> 9 Petal.Length Petal.Length Min / Max 1.0 / 6.9
#> 10 Petal.Length Petal.Length Med [IQR] 4.3 [1.6;5.1]
#> 11 Petal.Length Petal.Length Mean (std) 3.8 (1.8)
#> 12 Petal.Length Petal.Length N (NA) 150 (0)
#> 13 Petal.Width Petal.Width Min / Max 0.1 / 2.5
#> 14 Petal.Width Petal.Width Med [IQR] 1.3 [0.3;1.8]
#> 15 Petal.Width Petal.Width Mean (std) 1.2 (0.8)
#> 16 Petal.Width Petal.Width N (NA) 150 (0)
#> 17 Species Species setosa 50 (33.33%)
#> 18 Species Species versicolor 50 (33.33%)
#> 19 Species Species virginica 50 (33.33%)
You can select specific columns using names and helpers functions, and require specific summary statistics using funs and funs_arg. The by argument allows to specify a grouping variable. Here, as the mtcars2 has labels, they are also included in the crosstable.
The as_flextable function allows to output a beautiful HTML table that can be customized at will ( see the flextable package) and embed in a Word document (see the officer package).
library(tidyverse)
ct1 = crosstable(mtcars2, qsec, ends_with("t"), starts_with("c"), by=vs,
funs=c(mean, quantile), funs_arg=list(probs=c(.25,.75), digits=3))
ct1 %>% as_flextable(keep_id=TRUE)
|
.id |
label |
variable |
Engine |
|
|
straight |
vshaped |
|||
|
qsec |
1/4 mile time |
mean |
19.334 |
16.694 |
|
quantile 25% |
18.602 |
15.995 |
||
|
quantile 75% |
19.975 |
17.415 |
||
|
drat |
Rear axle ratio |
mean |
3.859 |
3.392 |
|
quantile 25% |
3.718 |
3.070 |
||
|
quantile 75% |
4.080 |
3.702 |
||
|
wt |
Weight (1000 lbs) |
mean |
2.611 |
3.689 |
|
quantile 25% |
2.001 |
3.236 |
||
|
quantile 75% |
3.209 |
3.844 |
||
|
cyl |
Number of cylinders |
4 |
10 (90.91%) |
1 (9.09%) |
|
6 |
4 (57.14%) |
3 (42.86%) |
||
|
8 |
0 (0%) |
14 (100.00%) |
||
|
carb |
Number of carburetors |
mean |
1.786 |
3.611 |
|
quantile 25% |
1.000 |
2.250 |
||
|
quantile 75% |
2.000 |
4.000 |
||
The margin argument changes the percentages calculation, while the total argument adds total rows or columns.
#margin and totals
ct2 = crosstable(mtcars2, disp, vs, by=am, margin=c("row", "col"), total="both")
ct2 %>% as_flextable
|
label |
variable |
Transmission |
Total |
|
|
auto |
manual |
|||
|
Displacement (cu.in.) |
Min / Max |
120.1 / 472.0 |
71.1 / 351.0 |
71.1 / 472.0 |
|
Med \[IQR\] |
275.8 \[196.3;360.0\] |
120.3 \[79.0;160.0\] |
196.3 \[120.8;326.0\] |
|
|
Mean (std) |
290.4 (110.2) |
143.5 (87.2) |
230.7 (123.9) |
|
|
N (NA) |
19 (0) |
13 (0) |
32 (0) |
|
|
Engine |
straight |
7 (50.00% / 36.84%) |
7 (50.00% / 53.85%) |
14 (43.75%) |
|
vshaped |
12 (66.67% / 63.16%) |
6 (33.33% / 46.15%) |
18 (56.25%) |
|
|
Total |
19 (59.38%) |
13 (40.62%) |
32 (100.00%) |
|
For the variable selection, you can use predicate functions. It is a good practice to wrap these in where. If the grouping variable is numeric, correlation coefficients will be calculated.
Using the test argument, you can perform tests with each variable and the grouping variable. Beware, automatic testing should only be done in an exploratory context, as it would cause extensive alpha inflation otherwise.
ct3 = crosstable(mtcars2, where(is.numeric), by=hp, test=TRUE)
ct3 %>% as_flextable
|
label |
variable |
Gross horsepower |
test |
|
Miles/(US) gallon |
pearson |
-0.78
|
p
value: <0.0001 |
|
Displacement (cu.in.) |
pearson |
0.79
|
p
value: <0.0001 |
|
Rear axle ratio |
pearson |
-0.45
|
p
value: 0.0100 |
|
Weight (1000 lbs) |
pearson |
0.66
|
p
value: <0.0001 |
|
1/4 mile time |
pearson |
-0.71
|
p
value: <0.0001 |
|
Number of carburetors |
pearson |
0.75
|
p
value: <0.0001 |
The predicate function can be a lambda function, using .x as the variable name.
Using the effect argument, you can calculate effect sizes for all numeric variables and for categorical variable of exactly 2 levels.
ct4 = crosstable(mtcars2, where(~is.numeric(.x) && mean(.x)>50), by=vs, effect=TRUE)
ct4 %>% as_flextable
|
label |
variable |
Engine |
effect |
|
|
straight |
vshaped |
|||
|
Displacement (cu.in.) |
Min / Max |
71.1 / 258.0 |
120.3 / 472.0 |
Difference
in means (Welch CI) (straight minus vshaped): -174.69 |
|
Med [IQR] |
120.5 [83.0;162.4] |
311.0 [275.8;360.0] |
||
|
Mean (std) |
132.5 (56.9) |
307.1 (106.8) |
||
|
N (NA) |
14 (0) |
18 (0) |
||
|
Gross horsepower |
Min / Max |
52.0 / 123.0 |
91.0 / 335.0 |
Difference
in means (Welch CI) (straight minus vshaped): -98.37 |
|
Med [IQR] |
96.0 [66.0;109.8] |
180.0 [156.2;226.2] |
||
|
Mean (std) |
91.4 (24.4) |
189.7 (60.3) |
||
|
N (NA) |
14 (0) |
18 (0) |
||
Finally, you can describe survival data using the Surv object from the package survival. The times and followup arguments allows for more control.
This is only possible using the formula syntax of variable selection, which allows more complex selection and is written as var1 + var2 ~ group.
library(survival)
ct5 = crosstable(aml, Surv(time, status) ~ x, times=c(0,15,30,150), followup=TRUE)
ct5 %>% as_flextable
|
label |
variable |
x |
|
|
Maintained |
Nonmaintained |
||
|
Surv(time, status) |
t=0 |
1.00 (0/11) |
1.00 (0/12) |
|
t=15 |
0.82 (2/8) |
0.58 (5/7) |
|
|
t=30 |
0.61 (2/5) |
0.29 (3/4) |
|
|
t=150 |
0.18 (3/1) |
0 (3/0) |
|
|
Median follow up [min ; max] |
103 [13 ; 161] |
NA [16 ; 45] |
|
|
Median survival |
31 |
23 |
|
crosstable is a rewrite of the awesome biostat2 package written by David Hajage. The user interface is quite different but the concept is the same.
Thanks David!